Can the decode of transformer be accelerated by NPU

## 🌱 Describe Feature Request
I trained a Transformer model. When I converted it as a whole into an mlmodel, I found that its intelligence could only be processed on the cpu. After splitting it into encode and decode, I discovered that encode could be normally accelerated using the NPU, but decode could only be processed on the cpu. Is it because decode is self-decoding, not a static issue? If decode can be accelerated by NPU, could a method for converting pt to mlmodel or mlpackage be provided
thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the decode of transformer be accelerated by NPU #2612

🌱 Describe Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can the decode of transformer be accelerated by NPU #2612

Description

🌱 Describe Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions