跳到内容

llmcompressor.transformers.data.wikitext

WikiTextDataset

WikiTextDataset(
    dataset_args: DatasetArguments,
    split: str,
    processor: Processor,
)

基类:TextGenerationDataset

Open Platypus 数据集的子文本生成类

参数

  • dataset_args

    (DatasetArguments) –

    数据集加载的配置设置

  • split

    (str) –

    从数据集中加载的拆分,例如 testtrain[:5%]

  • processor

    (Processor) –

    要在数据集上使用的处理器或分词器

源文件位于 llmcompressor/transformers/data/wikitext.py
def __init__(
    self, dataset_args: "DatasetArguments", split: str, processor: Processor
):
    dataset_args = deepcopy(dataset_args)
    dataset_args.dataset = "Salesforce/wikitext"
    dataset_args.text_column = "text"

    super().__init__(
        dataset_args=dataset_args,
        split=split,
        processor=processor,
    )