Skip to content

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

@jerryzh168

Description

@jerryzh168

Update: Our team will evaluate this more before outsourcing the migration to more people in the community

Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.

As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.

Please check out our updated docs for the new tensor subclass organization structure and guide for design:

migration status

inference config name current status plan POC status
MXFPInferenceConfig built on v2 n/a - done
NVFP4InferenceConfig built on v2 n/a - done
Float8DynamicActivationInt4WeightConfig built on v2 n/a - done
Int4WeightOnlyConfig v2 and v1 exists deprecate v1 ? #3513
Int8DynamicActivationIntxWeightConfig v2 and v1 exists deprecate v1 ? #3511
Float8WeightOnlyConfig v2 and v1 exists deprecate v1 ? #3510
Float8DynamicActivationFloat8WeightConfig v2 and v1 exists deprecate v1 ? #3510
IntxWeightOnlyConfig v2 and v1 exists deprecate v1 ? #3512
Float8DynamicActivationFloat8SemiSparseWeightConfig v1 exists add a new sparse packing_format for float8 dynamic quant config, then deprecate v1 ? #3361
Int8WeightOnlyConfig v1 exists create v2, then deprecate v1 ? #3407
Int8DynamicActivationInt8WeightConfig v1 exists create v2, then deprecate v1 ? #3407
Int8DynamicActivationInt4WeightConfig v1 exists move to prototype ? #3491
Int4DynamicActivationInt4WeightConfig v1 exists move to prototype ? #3491
GemliteUIntXWeightOnlyConfig v1 exists move to prototype ? #3491
Float8StaticActivationFloat8WeightConfig v1 exists move to prototype ? #3491
UIntXWeightOnlyConfig v1 exists move to prototype ? #3491
FPXWeightOnlyConfig v1 exists move to prototype ? #3491

appendix

List of things to migrate:
INT8

[migration done, TODO: delete old path after all migration is done] INT4 weight only

[move to prototype] INT4 weight + int8 activation

UINTx Weight Only

[migration done, TODO: delete old path after all migration is done] Int8DynamicActivationIntxWeightConfig

FP8

FPx

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions