Skip to content

Conversation

Sallyeen
Copy link

This PR introduces several major enhancements to the FramePack:

  • Local Service Deployment: Added support for running and invoking models as local services.
  • Distributed Inference Acceleration: Enabled parallelism with Ulysses and Ring Attention for faster distributed inference.
  • Parallel VAE Decode: Optimized VAE decoding by introducing parallel execution.
  • QKV Projection Fusion: Improved efficiency with optional fused QKV projection operations.
  • Torch.compile Integration: Leveraged torch.compile to further optimize runtime performance.

Mia and others added 4 commits August 26, 2025 17:03
@Sallyeen
Copy link
Author

Performance Improvement

  • Before optimization: Single RTX 4090, deployment time ≈ 237s
  • After optimization: 8× RTX 4090, deployment time ≈ 49s
  • Speedup: ~4.8× faster

Notes

These improvements have been tested in both local and distributed environments, showing significant gains in deployment efficiency and inference scalability.

Finally, I would like to sincerely thank you and the community for your outstanding contribution to this project.
It has provided an excellent foundation for further optimization, and I hope this PR can also be helpful in return.
Looking forward to your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant