You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I build CLBLAST for android. Although it can run in Adreno(tm) 740, I found that performance for HGEMM dose not have a significant sppedup. For example, when I use
It could well be that your hardware is slower in FP16 compared to FP32, even though there are memory bandwidth savings by using less data. However, it can also be that the CLBlast FP16 code is sub-optimal. One thing I suggest you to do is compile and run the tuners (see the docs), in particular for FP16, and perhaps even for the 4Kx4K matrices you are interested in. That should reveal whether you can achieve the 1TFLOPS with your device.
I build CLBLAST for android. Although it can run in Adreno(tm) 740, I found that performance for HGEMM dose not have a significant sppedup. For example, when I use
/clblast_client_xgemm --m 4096 --n 4096 --k 4096 --precision 16 --device 0 --platform 0 ,
the performance is 604.8 GFLOPS.
However, when I use
/clblast_client_xgemm --m 4096 --n 4096 --k 4096 --precision 32 --device 0 --platform 0 ,
the performance is 462.8 GFLOPS.
It that correct? Because I think the performance in HGEMM might have 1TFLOPS.
The text was updated successfully, but these errors were encountered: