Skip to content

int8_kv_cache scaleΒ #1557

Open
Open
@liquanfeng

Description

@liquanfeng

act_range.get(prefix + 'self_attn.q_proj')["y"],

I'm puzzled as to why the act_range of q_proj is being calculated in the scale for int8_kv_cache? Because the scale is only used to quantify the output of k_proj and v_proj.

Metadata

Metadata

Labels

InvestigatingKV-Cache Managementkv-cache management for efficient LLM inferencetriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions