跳到内容

llmcompressor.entrypoints.oneshot

用于对训练后模型进行优化的 Oneshot 压缩入口点。

提供主要的 oneshot 压缩入口点,用于在无需额外训练的情况下对预训练模型应用量化、剪枝和其他压缩技术。支持基于校准的压缩,并提供各种管道配置以实现高效的模型优化。

  • Oneshot

    负责对预训练模型进行一次性校准的类。

函数

  • oneshot

    执行模型的单次校准。

Oneshot

Oneshot(log_dir: str | None = None, **kwargs)

负责对预训练模型进行一次性校准的类。

此类负责一次性校准的整个生命周期,包括预处理(模型和分词器/处理器初始化)、模型优化(量化或稀疏化)以及后处理(保存输出)。可以通过使用配方来指定模型优化指令。

  • 输入关键字参数: kwargs 将被解析为

    • model_args:用于加载和配置预训练模型(例如,AutoModelForCausalLM)的参数。
    • dataset_args:用于数据集相关配置的参数,例如校准数据加载器。
    • recipe_args:用于定义和配置指定优化操作的配方的参数。

    解析器定义在 src/llmcompressor/args/ 中。

  • 生命周期概述: 一次性校准生命周期包括三个步骤

    1. 预处理:
      • 实例化预训练模型和分词器/处理器。
      • 确保输入和输出嵌入层在共享张量时保持未绑定。
      • 修补模型以包含用于保存量化配置的附加功能。
    2. 一次性校准:
      • 使用全局 CompressionSession 优化模型,并应用配方定义的修改器(例如,GPTQModifierSparseGPTModifier)。
    3. 后处理:
      • 将模型、分词器/处理器和配置保存到指定的 output_dir
  • 用法

    oneshot = Oneshot(model=model, recipe=recipe, dataset=dataset)
    oneshot()
    
    # Access the processed components
    model = oneshot.model
    processor = oneshot.processor
    recipe = oneshot.recipe
    

方法:init(**kwargs):通过解析输入参数、执行预处理和设置实例属性来初始化 Oneshot 对象。

__call__(**kwargs):
    Performs the one-shot calibration process by preparing a calibration
    dataloader, applying recipe modifiers to the model, and executing
    postprocessing steps.

save():
    Saves the calibrated model and tokenizer/processor to the specified
    `output_dir`. Supports saving in compressed formats based on model
    arguments.

apply_recipe_modifiers(calibration_dataloader, **kwargs):
    Applies lifecycle actions (e.g., `initialize`, `finalize`) using modifiers
    defined in the recipe. Each action is executed via the global
    `CompressionSession`.

使用提供的参数初始化 Oneshot 类。

将输入的关键字参数解析为 model_argsdataset_argsrecipe_args。执行预处理以初始化模型和分词器/处理器。

参数

  • model_args

    ModelArguments 参数,负责控制模型加载和保存逻辑

  • dataset_args

    DatasetArguments 参数,负责控制数据集加载、预处理和数据加载器加载

  • recipe_args

    RecipeArguments 参数,负责包含与配方相关的参数

  • output_dir

    执行一次性操作后保存输出模型的路径

  • log_dir

    (str | None, 默认值: None ) –

    保存一次性运行期间日志的路径。如果为 None,则不向文件记录任何内容。

方法

源文件位于 llmcompressor/entrypoints/oneshot.py
def __init__(
    self,
    log_dir: str | None = None,
    **kwargs,
):
    """
    Initializes the `Oneshot` class with provided arguments.

    Parses the input keyword arguments into `model_args`, `dataset_args`, and
    `recipe_args`. Performs preprocessing to initialize the model and
    tokenizer/processor.

    :param model_args: ModelArguments parameters, responsible for controlling
        model loading and saving logic
    :param dataset_args: DatasetArguments parameters, responsible for controlling
        dataset loading, preprocessing and dataloader loading
    :param recipe_args: RecipeArguments parameters, responsible for containing
        recipe-related parameters
    :param output_dir: Path to save the output model after carrying out oneshot
    :param log_dir: Path to save logs during oneshot run.
        Nothing is logged to file if None.
    """
    # Set up file logging (no default files):
    # 1) If LLM_COMPRESSOR_LOG_FILE is set, log to that file.
    # 2) Else, if an explicit log_dir is provided, create a timestamped file there.
    log_file = os.environ.get("LLM_COMPRESSOR_LOG_FILE", "").strip()
    if log_file:
        p = Path(log_file).expanduser()
        p.parent.mkdir(parents=True, exist_ok=True)
        logger.add(
            str(p),
            level="DEBUG",
        )
    elif log_dir:
        os.makedirs(log_dir, exist_ok=True)
        date_str = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        logger.add(
            f"{log_dir}/oneshot_{date_str}.log",
            level="DEBUG",
        )

    model_args, dataset_args, recipe_args, output_dir = parse_args(**kwargs)

    self.model_args = model_args
    self.dataset_args = dataset_args
    self.recipe_args = recipe_args
    self.output_dir = output_dir

    # initialize the model and processor
    pre_process(model_args, dataset_args, output_dir)

    # Set instance attributes
    self.model = self.model_args.model
    self.processor = self.model_args.processor
    self.recipe = self.recipe_args.recipe

apply_recipe_modifiers

apply_recipe_modifiers(
    calibration_dataloader: DataLoader | None,
    recipe_stage: str | None = None,
)

在生命周期中将配方修改器应用于模型。

修改器在配方中定义,并通过全局 CompressionSession 通过生命周期操作(initializefinalize)执行。

源文件位于 llmcompressor/entrypoints/oneshot.py
def apply_recipe_modifiers(
    self,
    calibration_dataloader: DataLoader | None,
    recipe_stage: str | None = None,
):
    """
    Applies recipe modifiers to the model during the lifecycle.

    The modifiers are defined in the recipe and executed via lifecycle actions
    (`initialize`, `finalize`) through the global `CompressionSession`.


    :param: calibration_dataloader: Dataloader for calibration data.

    Raises:
        RuntimeError: If any modifier fails during execution.
    """

    session = active_session()
    session.reset()

    # (Helen INFERENG-661): validate recipe modifiers before initialization
    # Apply MoE calibration context for the entire calibration process
    with moe_calibration_context(
        self.model,
        calibrate_all_experts=self.dataset_args.moe_calibrate_all_experts,
    ):
        session.initialize(
            model=self.model,
            start=-1,
            recipe=self.recipe,
            recipe_stage=recipe_stage,
            recipe_args=self.recipe_args.recipe_args,
            calib_data=calibration_dataloader,
        )
        user_pipeline = self.dataset_args.pipeline
        pipeline = CalibrationPipeline.from_modifiers(
            session.lifecycle.recipe.modifiers, user=user_pipeline
        )

        pipeline(
            self.model,
            calibration_dataloader,
            self.dataset_args,
        )

    session.finalize()

oneshot

oneshot(
    model: str | PreTrainedModel,
    config_name: str | None = None,
    tokenizer: str | PreTrainedTokenizerBase | None = None,
    processor: str | ProcessorMixin | None = None,
    use_auth_token: bool = False,
    precision: str = "auto",
    tie_word_embeddings: bool = True,
    trust_remote_code_model: bool = False,
    save_compressed: bool = True,
    model_revision: str = "main",
    recipe: str | list[str] | None = None,
    recipe_args: list[str] | None = None,
    clear_sparse_session: bool = False,
    stage: str | None = None,
    dataset: str | Dataset | DatasetDict | None = None,
    dataset_config_name: str | None = None,
    dataset_path: str | None = None,
    splits: str | list[str] | dict[str, str] | None = None,
    batch_size: int = 1,
    data_collator: str | Callable = "truncation",
    num_calibration_samples: int = 512,
    shuffle_calibration_samples: bool = False,
    max_seq_length: int = 384,
    pad_to_max_length: bool = True,
    text_column: str = "text",
    concatenate_data: bool = False,
    streaming: bool = False,
    overwrite_cache: bool = False,
    preprocessing_num_workers: int | None = None,
    min_tokens_per_module: float | None = None,
    moe_calibrate_all_experts: bool = True,
    quantization_aware_calibration: bool = True,
    output_dir: str | None = None,
    log_dir: str | None = None,
    **kwargs,
) -> PreTrainedModel

执行模型的单次校准。

模型参数

参数

  • model

    (str | PreTrainedModel) –

    来自 huggingface.co/models 的预训练模型标识符或本地模型路径。必需参数。

  • distill_teacher

    用于蒸馏的教师模型(已训练的文本生成模型)。

  • config_name

    (str | None, 默认值: None ) –

    如果与 model_name 不同,则为预训练配置名称或路径。

  • tokenizer

    (str | PreTrainedTokenizerBase | None, default: None ) –

    如果与 model_name 不同,则为预训练分词器名称或路径。

  • processor

    (str | ProcessorMixin | None, default: None ) –

    如果与 model_name 不同,则为预训练处理器名称或路径。

  • use_auth_token

    (bool, 默认值: False ) –

    是否为私有模型使用 Hugging Face 认证令牌。

  • precision

    (str, default: 'auto' ) –

    将模型权重转换为的精度,默认为 auto。

  • tie_word_embeddings

    (bool, 默认值: True ) –

    如果可能,模型的输入和输出词嵌入是否应保持绑定。False 表示始终取消绑定。

  • trust_remote_code_model

    (bool, 默认值: False ) –

    是否允许自定义模型执行其自身的建模文件。

  • save_compressed

    (bool, 默认值: True ) –

    保存期间是否压缩稀疏模型。

  • model_revision

    (str, 默认值: 'main' ) –

    要使用的具体模型版本(可以是分支名称、标签或提交 ID)。# 配方参数

  • recipe

    (str | list[str] | None, default: None ) –

    LLM Compressor 配方路径,或指向多个 LLM Compressor 配方的路径列表。

  • recipe_args

    (list[str] | None, default: None ) –

    要评估的配方参数列表,格式为“key1=value1”、“key2=value2”。

  • clear_sparse_session

    (bool, 默认值: False ) –

    是否在运行之间清除 CompressionSession/ CompressionLifecycle 数据。

  • stage

    (str | None, 默认值: None ) –

    用于一次性校准的配方阶段。# 数据集参数

  • 数据集

    (str | Dataset | DatasetDict | None, default: None ) –

    要使用的数据集名称(通过 datasets 库)。

  • dataset_config_name

    (str | None, 默认值: None ) –

    要使用的数据集的配置名称。

  • dataset_path

    (str | None, 默认值: None ) –

    自定义数据集的路径。支持 json、csv、dvc。

  • splits

    (str | list[str] | dict[str, str] | None, default: None ) –

    可选的每个分割的下载百分比。

  • batch_size

    (int, 默认值: 1 ) –

    校准数据集批次大小。在校准期间,LLM Compressor 会禁用 lm_head 输出计算,以减少大型校准批次大小的内存使用。较大的批次大小可能会导致过度填充或截断,具体取决于 data_collator。

  • data_collator

    (str | Callable, default: 'truncation' ) –

    用于从数据集中形成批次的函数。也可以指定“truncation”或“padding”来截断或填充批次中不均匀的序列长度。默认为“padding”。

  • num_calibration_samples

    (int, default: 512 ) –

    用于一次性校准的样本数。

  • shuffle_calibration_samples

    (bool, 默认值: False ) –

    是否在校准前对数据集进行混洗。

  • max_seq_length

    (int, default: 384 ) –

    分词后最大总输入序列长度。

  • pad_to_max_length

    (bool, 默认值: True ) –

    是否将所有样本填充到 max_seq_length

  • text_column

    (str, default: 'text' ) –

    用作分词器/处理器 text 输入的键。

  • concatenate_data

    (bool, 默认值: False ) –

    是否连接数据点以填满 max_seq_length。

  • streaming

    (bool, 默认值: False ) –

    设置为 True 以从云数据集流式传输数据。

  • overwrite_cache

    (bool, 默认值: False ) –

    是否覆盖缓存的预处理数据集。

  • preprocessing_num_workers

    (int | None, 默认值: None ) –

    数据集预处理的进程数。

  • min_tokens_per_module

    (float | None, 默认值: None ) –

    每个模块的最小令牌百分比,与 MoE 模型相关。

  • moe_calibrate_all_experts

    (bool, 默认值: True ) –

    在 MoE 模型校准期间是否校准所有专家。当设置为 True 时,所有专家在校准期间都会看到所有令牌,从而确保正确的量化统计数据。当设置为 False 时,仅使用路由的专家。仅与 MoE 模型相关。默认为 True。

  • quantization_aware_calibration

    (bool, 默认值: True ) –

    是否在顺序管道中启用量化感知校准。当设置为 True 时,在校准的前向传递中应用量化。当设置为 False 时,在校准的前向传递中禁用量化。默认为 True。# 杂项参数

  • output_dir

    (str | None, 默认值: None ) –

    校准后保存输出模型的路径。如果为 None,则不保存任何内容。

  • log_dir

    (str | None, 默认值: None ) –

    保存一次性运行期间日志的路径。如果为 None,则不向文件记录任何内容。

返回

  • PreTrainedModel

    已校准的 PreTrainedModel

源文件位于 llmcompressor/entrypoints/oneshot.py
def oneshot(
    # Model arguments
    model: str | PreTrainedModel,
    config_name: str | None = None,
    tokenizer: str | PreTrainedTokenizerBase | None = None,
    processor: str | ProcessorMixin | None = None,
    use_auth_token: bool = False,
    precision: str = "auto",
    tie_word_embeddings: bool = True,
    trust_remote_code_model: bool = False,
    save_compressed: bool = True,
    model_revision: str = "main",
    # Recipe arguments
    recipe: str | list[str] | None = None,
    recipe_args: list[str] | None = None,
    clear_sparse_session: bool = False,
    stage: str | None = None,
    # Dataset arguments
    dataset: str | Dataset | DatasetDict | None = None,
    dataset_config_name: str | None = None,
    dataset_path: str | None = None,
    splits: str | list[str] | dict[str, str] | None = None,
    batch_size: int = 1,
    data_collator: str | Callable = "truncation",
    num_calibration_samples: int = 512,
    shuffle_calibration_samples: bool = False,
    max_seq_length: int = 384,
    pad_to_max_length: bool = True,
    text_column: str = "text",
    concatenate_data: bool = False,
    streaming: bool = False,
    overwrite_cache: bool = False,
    preprocessing_num_workers: int | None = None,
    min_tokens_per_module: float | None = None,
    moe_calibrate_all_experts: bool = True,
    quantization_aware_calibration: bool = True,
    # Miscellaneous arguments
    output_dir: str | None = None,
    log_dir: str | None = None,
    **kwargs,
) -> PreTrainedModel:
    """
    Performs oneshot calibration on a model.

    # Model arguments
    :param model: A pretrained model identifier from huggingface.co/models or a path
        to a local model. Required parameter.
    :param distill_teacher: Teacher model (a trained text generation model)
        for distillation.
    :param config_name: Pretrained config name or path if not the same as
        model_name.
    :param tokenizer: Pretrained tokenizer name or path if not the same as
        model_name.
    :param processor: Pretrained processor name or path if not the same as
        model_name.
    :param use_auth_token: Whether to use Hugging Face auth token for private
        models.
    :param precision: Precision to cast model weights to, default to auto.
    :param tie_word_embeddings: Whether the model's input and output word embeddings
        should be left tied if possible. False means always untie.
    :param trust_remote_code_model: Whether to allow for custom models to execute
        their own modeling files.
    :param save_compressed: Whether to compress sparse models during save.
    :param model_revision: The specific model version to use (can be branch name,
        tag, or commit id).

    # Recipe arguments
    :param recipe: Path to a LLM Compressor recipe, or a list of paths
      to multiple LLM Compressor recipes.
    :param recipe_args: List of recipe arguments to evaluate, in the
        format "key1=value1", "key2=value2".
    :param clear_sparse_session: Whether to clear CompressionSession/
        CompressionLifecycle data between runs.
    :param stage: The stage of the recipe to use for oneshot.

    # Dataset arguments
    :param dataset: The name of the dataset to use (via the datasets
        library).
    :param dataset_config_name: The configuration name of the dataset
        to use.
    :param dataset_path: Path to a custom dataset. Supports json, csv, dvc.
    :param splits: Optional percentages of each split to download.
    :param batch_size: calibration dataset batch size. During calibration,
        LLM Compressor disables lm_head output computations to reduce memory
        usage from large calibration batch sizes. Large batch sizes may result
        excess padding or truncation, depending on the data_collator
    :param data_collator: The function to used to form a batch from the dataset. Can
        also specify 'truncation' or 'padding' to truncate or pad non-uniform sequence
        lengths in a batch. Defaults to 'padding'.
    :param num_calibration_samples: Number of samples to use for one-shot
        calibration.
    :param shuffle_calibration_samples: Whether to shuffle the dataset before
        calibration.
    :param max_seq_length: Maximum total input sequence length after tokenization.
    :param pad_to_max_length: Whether to pad all samples to `max_seq_length`.
    :param text_column: Key to use as the `text` input to tokenizer/processor.
    :param concatenate_data: Whether to concatenate datapoints to fill
        max_seq_length.
    :param streaming: True to stream data from a cloud dataset.
    :param overwrite_cache: Whether to overwrite the cached preprocessed datasets.
    :param preprocessing_num_workers: Number of processes for dataset preprocessing.
    :param min_tokens_per_module: Minimum percentage of tokens per
        module, relevant for MoE models.
    :param moe_calibrate_all_experts: Whether to calibrate all experts during MoE
        model calibration. When True, all experts will see all tokens during
        calibration, ensuring proper quantization statistics. When False, only
        routed experts will be used. Only relevant for MoE models. Default is True.
    :param quantization_aware_calibration: Whether to enable quantization-aware
        calibration in the sequential pipeline. When True, quantization is applied
        during forward pass in calibration. When False, quantization is disabled
        during forward pass in calibration. Default is set to True.

    # Miscellaneous arguments
    :param output_dir: Path to save the output model after calibration.
        Nothing is saved if None.
    :param log_dir: Path to save logs during oneshot run.
        Nothing is logged to file if None.

    :return: The calibrated PreTrainedModel
    """

    # pass all args directly into Oneshot
    local_args = {
        k: v for k, v in locals().items() if k not in ("local_args", "kwargs")
    }
    one_shot = Oneshot(**local_args, **kwargs)
    one_shot()

    return one_shot.model