Skip to content

Attention 层参数量计算疑问 #376

@kloop3

Description

@kloop3

AI 计算集群概述中code1中,Attention 层参数量计算时,公式是否有问题?标准多头注意力(不考虑GQA等技术),参数量是否应该是P_{attn_per_layer} = (d_{model} \times d_{model})Q8 + (d*{model} \times d_{model})K8 + (d*{model} \times d_{model})V8 + (d*{model} \times d_{model})O,也就是需要QKV的参数量应该是d_modeld_model*n_heads

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions