压缩格式
下表概述了在压缩过程中应用于模型的可能量化和稀疏性压缩格式。格式根据量化方案和稀疏性类型确定。有关量化方案的更多详细信息,请参阅 guides/compression_schemes.md。
| 量化 | Sparsity | Quant Compressor | Sparsity Compressor |
|---|---|---|---|
| W8A8 - int | None | int_quantized | Dense |
| W8A8 - float | None | float_quantized | Dense |
| W4A16 - float | None | nvfp4_pack_quantized | Dense |
| W4A4 - float | None | nvfp4_pack_quantized | Dense |
| W4A16 - int | None | pack_quantized | Dense |
| W8A16 - int | None | pack_quantized | Dense |
| W8A16 - float | None | naive_quantized | Dense |
| W8A8 - int | 2:4 | int_quantized | Sparse24 |
| W8A8 - float | 2:4 | float_quantized | Sparse24 |
| W4A16 - int | 2:4 | marlin_24 | Dense |
| W8A16 - int | 2:4 | marlin_24 | Dense |
| W8A16 - float | 2:4 | naive_quantized | Dense |