跳到内容

llmcompressor.transformers.compression.sparsity_metadata_config

  • SparsityConfigMetadata

    用于根据模型中的可读元数据填写 SparsityCompressionConfig 的辅助函数类

SparsityConfigMetadata

用于根据模型中的可读元数据填写 SparsityCompressionConfig 的辅助函数类

方法

fill_config_details staticmethod

fill_config_details(
    config: SparsityCompressionConfig,
    model: Module,
    state_dict: dict[str, Tensor] | None = None,
)

从给定模型填充信息性稀疏参数

参数

  • config

    (SparsityCompressionConfig) –

    要填充的稀疏配置

  • model

    (Module) –

    要从中推断配置参数的 PyTorch 模型

  • state_dict

    (dict[str, Tensor] | None, default: None ) –

    用于收集全局 FSDP 模型信息的可选 state_dict,用于替换模型中的 state_dict

源文件位于 llmcompressor/transformers/compression/sparsity_metadata_config.py
@staticmethod
def fill_config_details(
    config: SparsityCompressionConfig,
    model: Module,
    state_dict: dict[str, Tensor] | None = None,
):
    """
    Fills in informational sparsity parameters from a given model

    :param config: sparsity config to fill in
    :param model: pytorch model to infer config parameters from
    :param state_dict: optional state_dict to replace that in model, used for
    gathering global FSDP model info
    """
    config.global_sparsity = SparsityConfigMetadata.infer_global_sparsity(
        model, state_dict=state_dict
    )
    config.sparsity_structure = SparsityConfigMetadata.infer_sparsity_structure()

from_pretrained staticmethod

from_pretrained(
    model: Module,
    state_dict: dict[str, Tensor] | None = None,
    compress: bool = False,
    quantization_format: CompressionFormat | None = None,
    disable_sparse_compression: bool = False,
    sparsity_structure: str | None = None,
) -> SparsityCompressionConfig | None

确定给定模型的压缩类型和信息参数

参数

  • model

    (Module) –

    用于计算稀疏配置的 PyTorch 模型

  • state_dict

    (dict[str, Tensor] | None, default: None ) –

    用于收集全局 FSDP 模型信息的可选 state_dict,用于替换模型中的 state_dict

  • compress

    (bool, 默认值: False ) –

    是否在磁盘上压缩模型

  • quantization_format

    (CompressionFormat | None, default: None ) –

    模型正在使用的量化压缩格式

  • disable_sparse_compression

    (bool, 默认值: False ) –

    是否使用稀疏压缩器压缩模型,如果为 True,则稀疏压缩格式将为密集格式,默认为 False。

  • sparsity_structure

    (str | None, 默认值: None ) –

    模型的稀疏结构。直接提供它将跳过直接从模型推断的步骤

返回

  • SparsityCompressionConfig | None

    从模型推断的压缩配置

源文件位于 llmcompressor/transformers/compression/sparsity_metadata_config.py
@staticmethod
def from_pretrained(
    model: Module,
    state_dict: dict[str, Tensor] | None = None,
    compress: bool = False,
    quantization_format: CompressionFormat | None = None,
    disable_sparse_compression: bool = False,
    sparsity_structure: str | None = None,
) -> SparsityCompressionConfig | None:
    """
    Determines compression type and informational parameters for a given model

    :param model: pytorch model to calculate sparsity config for
    :param state_dict: optional state_dict to replace that in model, used for
    gathering global FSDP model info
    :param compress: whether or not to compress the model on disk
    :param quantization_format: the quantization compression format being used
        for the model
    :param disable_sparse_compression: whether or not to compress the model with
        sparse compressors, If True, the sparse compression format will
        be dense, default is False.
    :param sparsity_structure: sparsity structure for the model. Providing it as
        input will skip the step to infer it from the model directly
    :return: compression config inferred from the model
    """
    # TODO: can we remove this? Do we need the state dict?
    global_sparsity = SparsityConfigMetadata.infer_global_sparsity(
        model, state_dict=state_dict
    )

    if sparsity_structure is None:
        sparsity_structure = SparsityConfigMetadata.infer_sparsity_structure(
            model=model
        )

    if (
        disable_sparse_compression
        or quantization_format == CompressionFormat.marlin_24
    ):
        # sparse compressor should be dense
        # when no_sparse_compression is True
        # or when marlin_24 is used
        format = CompressionFormat.dense.value
    elif compress and SparsityConfigMetadata.is_sparse24_bitmask_supported(
        model, sparsity_structure
    ):
        format = CompressionFormat.sparse_24_bitmask.value
    else:
        format = CompressionFormat.dense.value

    # TODO: eventually should be done similar to quantization
    # so we do not have to infer
    targets, ignores = infer_sparse_targets_and_ignores(
        model,
        sparsity_structure=sparsity_structure,
        sparsity_threshold=SparsityConfigMetadata.SPARSITY_THRESHOLD,
    )

    if not (targets or ignores):
        # no sparsity config
        # needed if targets/ignores are empty
        return None

    return SparsityCompressionConfig.load_from_registry(
        format,
        global_sparsity=global_sparsity,
        sparsity_structure=sparsity_structure,
        targets=targets,
        ignore=ignores,
    )

infer_global_sparsity staticmethod

infer_global_sparsity(
    model: Module,
    state_dict: dict[str, Tensor] | None = None,
) -> float

计算模型中稀疏零权重的全局百分比

参数

  • model

    (Module) –

    要从中推断稀疏度的 PyTorch 模型

  • state_dict

    (dict[str, Tensor] | None, default: None ) –

    用于收集全局 FSDP 模型信息的可选 state_dict,用于替换模型中的 state_dict

返回

  • float

    模型的全局稀疏度

源文件位于 llmcompressor/transformers/compression/sparsity_metadata_config.py
@staticmethod
def infer_global_sparsity(
    model: Module, state_dict: dict[str, Tensor] | None = None
) -> float:
    """
    Calculates the global percentage of sparse zero weights in the model

    :param model: pytorch model to infer sparsity of
    :param state_dict: optional state_dict to replace that in model, used for
    gathering global FSDP model info
    :return: global sparsity of model
    """

    info = ModuleSparsificationInfo(model, state_dict=state_dict)
    global_sparsity = info.params_sparse_percent / 100.0  # convert % to float
    return global_sparsity

infer_sparsity_structure staticmethod

infer_sparsity_structure(
    model: Module | None = None,
    check_only_modifiers: bool | None = False,
) -> str

确定已应用了何种稀疏结构(如果有)。

首先,会尝试从当前活动的稀疏会话中推断稀疏结构。

如果失败,则从模型(如果提供)推断稀疏结构

最后,如果两者都失败,则将稀疏结构设置为“非结构化”

返回

  • str

    稀疏结构(字符串格式)

源文件位于 llmcompressor/transformers/compression/sparsity_metadata_config.py
@staticmethod
def infer_sparsity_structure(
    model: Module | None = None, check_only_modifiers: bool | None = False
) -> str:
    """
    Determines what sparsity structure, if any, was applied.

    First, there is an attempt to deduce the sparsity structure
    from the currently active sparse session.

    If that fails, the sparsity structure is inferred from the
    model (if provided)

    Finally, if both fail, the sparsity structure is set to
    "unstructured"

    :return: sparsity structure as a string
    """
    sparsity_structure = None

    current_session = active_session()
    stage_modifiers = current_session.lifecycle.recipe.modifiers
    if stage_modifiers:
        sparsity_structure = infer_sparsity_structure_from_modifiers(
            stage_modifiers
        )

    if check_only_modifiers:
        return sparsity_structure

    if model and sparsity_structure is None:
        sparsity_structure = infer_sparsity_structure_from_model(model)

    return SparsityStructure(sparsity_structure).value

is_sparse24_bitmask_supported staticmethod

is_sparse24_bitmask_supported(
    model: Module, sparsity_structure: str | None = None
) -> bool

确定 vLLM 中的给定模型及其稀疏结构是否支持 sparse 24 bitmask 稀疏压缩器

参数

  • model

    (Module) –

    用于检查 sparse 24 bit 稀疏支持的 PyTorch 模型

  • sparsity_structure

    (str | None, 默认值: None ) –

    模型的稀疏结构,如果未提供,则会进行推断

返回

  • bool

    vLLM 是否支持 sparse 24 bitmask 压缩

源文件位于 llmcompressor/transformers/compression/sparsity_metadata_config.py
@staticmethod
def is_sparse24_bitmask_supported(
    model: Module,
    sparsity_structure: str | None = None,
) -> bool:
    """
    Determines if sparse 24 bitmask sparse compressor is supported for a given model
    and its sparsity structure in vLLM

    :param model: pytorch model to check for sparse 24 bit sparsity support
    :param sparsity_structure: sparsity structure of the model, if
        not supplied it will be inferred
    :return: whether or not sparse 24 bitmask compression is supported
        in vLLM for the given model
    """
    if sparsity_structure is None:
        sparsity_structure = SparsityConfigMetadata.infer_sparsity_structure(model)

    if sparsity_structure != SparsityStructure.TWO_FOUR.value:
        # only supported for 2:4 sparsity
        return False

    if not is_model_quantized(model):
        logger.warning(
            "Compressed Sparse-only 2:4 models are not supported in vLLM<=0.7.0, "
            "consider saving with `disable_sparse_compression` set, "
            "`model.save_pretrained(..., disable_sparse_compression=True)`"
        )
        return True

    # when model is quantized, and has 2:4 sparsity

    supported_scheme_types: list[str] = [
        QuantizationType.INT.value,
        QuantizationType.FLOAT.value,
    ]

    for submodule in model.modules():
        if not is_module_quantized(submodule):
            continue

        weight_scheme = submodule.quantization_scheme.weights
        input_scheme = submodule.quantization_scheme.input_activations

        if weight_scheme and input_scheme:
            # weight and activation quantization
            # check schemes are supported
            for scheme in [weight_scheme, input_scheme]:
                scheme_supported = (
                    scheme.num_bits == 8 and scheme.type in supported_scheme_types
                )
                if not scheme_supported:
                    logger.info(
                        "Quantization scheme not supported,"
                        " turning off sparse 24 compression."
                        f" Invalid Scheme: {scheme}"
                    )
                    return False

        elif weight_scheme or input_scheme:
            # weight only quantization
            logger.info(
                "Weight only quantization detected, "
                "turning off sparse 24 compression."
            )
            return False

    return True