Efficient block-sparse attention kernels of Sparse transformer #106

MlWoo · 2020-06-07T12:29:33Z

Thank you very much for the great job. Sparse transformer is used in the model and it is could be implemented with Efficient block-sparse attention kernels which mentioned in the paper. Could the memory efficiency and computation efficiency improve with the kernels? If could, how much do them boost？ Looking forward to the good news. I have noticed the code is released by you in tensorflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient block-sparse attention kernels of Sparse transformer #106

Efficient block-sparse attention kernels of Sparse transformer #106

MlWoo commented Jun 7, 2020 •

edited

Loading

Efficient block-sparse attention kernels of Sparse transformer #106

Efficient block-sparse attention kernels of Sparse transformer #106

Comments

MlWoo commented Jun 7, 2020 • edited Loading

MlWoo commented Jun 7, 2020 •

edited

Loading