跳到内容

压缩格式

下表概述了在压缩过程中应用于模型的可能量化和稀疏性压缩格式。格式根据量化方案和稀疏性类型确定。有关量化方案的更多详细信息,请参阅 guides/compression_schemes.md

量化 Sparsity Quant Compressor Sparsity Compressor
W8A8 - int None int_quantized Dense
W8A8 - float None float_quantized Dense
W4A16 - float None nvfp4_pack_quantized Dense
W4A4 - float None nvfp4_pack_quantized Dense
W4A16 - int None pack_quantized Dense
W8A16 - int None pack_quantized Dense
W8A16 - float None naive_quantized Dense
W8A8 - int 2:4 int_quantized Sparse24
W8A8 - float 2:4 float_quantized Sparse24
W4A16 - int 2:4 marlin_24 Dense
W8A16 - int 2:4 marlin_24 Dense
W8A16 - float 2:4 naive_quantized Dense