Replies: 1 comment
-
This was used https://github.com/mlfoundations/open_clip/blob/main/src/training/profiler.py ... BUT, it's not plug and play, with the torch MultiheadAttention module being used and/or F.sdpa you have to hack/disable things or modify fvcore (not being maintained really) so that the correct values are being used for the attention... not all papers mean FLOPs when they say FLOPs, sometimes it's actually GMACS, the GFLOPS values here are GFLOPS though. I'm inclined to think the numbers hear are good... rule of thumb is 22num_layers*dim^2 and that is ~40 for the B/16. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I want to know how do you profile the CLIP models https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv. Becauce I can't match the profile results with tools that I tried (e.g. torchsummaryX, thop, and torchinfo). In fact, I got very different results. Among them, I think the closest result to the FLOPs plotted in the CLIP paper
Learning Transferable Visual Models From Natural Language Supervision
(figure below) is achieved by torchinfo, which is 14.04GFLOPs (multi-adds). I also tried the codes provided by @jongwook (openai/CLIP#143 (comment)). However, it gave a result of over 161GFLOPs. According to the model profile log provided by this repo, the computation complexity of CLIP with ViT-B/16 should be 41.09 GFLOPs.What profile tools or library do you use to acquire this profile result? Kindly help me solving this problem.
Beta Was this translation helpful? Give feedback.
All reactions