How do you profile the CLIP models #904

X-funbean · 2024-06-28T08:13:14Z

X-funbean
Jun 28, 2024

Hi, I want to know how do you profile the CLIP models https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv. Becauce I can't match the profile results with tools that I tried (e.g. torchsummaryX, thop, and torchinfo). In fact, I got very different results. Among them, I think the closest result to the FLOPs plotted in the CLIP paper Learning Transferable Visual Models From Natural Language Supervision (figure below) is achieved by torchinfo, which is 14.04GFLOPs (multi-adds). I also tried the codes provided by @jongwook (openai/CLIP#143 (comment)). However, it gave a result of over 161GFLOPs. According to the model profile log provided by this repo, the computation complexity of CLIP with ViT-B/16 should be 41.09 GFLOPs.

What profile tools or library do you use to acquire this profile result? Kindly help me solving this problem.

rwightman · 2024-06-28T23:05:02Z

rwightman
Jun 28, 2024
Maintainer

This was used https://github.com/mlfoundations/open_clip/blob/main/src/training/profiler.py ... BUT, it's not plug and play, with the torch MultiheadAttention module being used and/or F.sdpa you have to hack/disable things or modify fvcore (not being maintained really) so that the correct values are being used for the attention... not all papers mean FLOPs when they say FLOPs, sometimes it's actually GMACS, the GFLOPS values here are GFLOPS though.

I'm inclined to think the numbers hear are good... rule of thumb is 22num_layers*dim^2 and that is ~40 for the B/16.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you profile the CLIP models #904

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How do you profile the CLIP models #904

X-funbean Jun 28, 2024

Replies: 1 comment

rwightman Jun 28, 2024 Maintainer

X-funbean
Jun 28, 2024

rwightman
Jun 28, 2024
Maintainer